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Abstract 

The  conventional  approach  to  routing  in  computer  networks  consists  of  using  a  heuristic  to  compute 
a  single  shortest  path  from  a  source  to  a  destination.  Single-path  routing  is  very  responsive  to  topo¬ 
logical  and  link-cost  changes;  however,  except  under  light  traffic  loads,  the  delays  obtained  with  this 
type  of  routing  are  far  from  optimal.  Furthermore,  if  link  costs  are  associated  with  delays,  single-path 
routing  exhibits  oscillatory  behavior  and  becomes  unstable  as  traffic  loads  increase.  On  the  other  hand, 
minimum-delay  routing  approaches  can  minimize  delays  only  when  traffic  is  stationary  or  very  slowly 
changing. 

We  present  a  “near-optimal”  routing  framework  that  offers  delays  comparable  to  those  of  optimal 
routing  and  that  is  as  flexible  and  responsive  as  single-path  routing  protocols  proposed  to  date.  First, 
an  approximation  to  the  Gallager’s  minimum-delay  routing  problem  is  derived,  and  then  algorithms  that 
implement  the  approximation  scheme  are  presented  and  verified.  We  describe  the  first  routing  algorithm 
based  on  link-state  information  that  provides  multiple  paths  of  unequal  cost  to  each  destination  that  are 
loop-free  at  every  instant.  We  show  through  simulations  that  the  delays  obtained  in  our  framework  for 
minimum-delay  routing  are  comparable  to  those  obtained  using  Gallager’s  algorithm  for  minimum-delay 
routing.  Also,  we  show  that  our  framework  renders  far  smaller  delays  and  makes  better  use  of  resources 
than  traditional  single-path  routing. 


*This  work  was  supported  in  part  at  UCSC  by  the  Defense  Advanced  Research  Projects  Agency  (DARPA)  under  grants  F30602- 
97-1-0291  and  F19628-96-C-0038. 
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1  Introduction 


The  standard  approach  to  routing  in  computer  networks  today  consists  of  computing  a  single  shortest  path 
from  a  source  to  each  destination  using  some  heuristic  link-cost  metric,  which  is  typically  not  directly  asso¬ 
ciated  with  the  transmission  and  queueing  delays  over  links  and  paths.  A  less  common  approach  to  routing 
is  that  of  defining  the  routing  problem  as  an  optimization  problem  (e.g.,  multicommodity  problem  [5])  with 
a  specific  objecfive  function,  such  as  minimizing  delays  or  maximizing  fhroughpuf,  and  solving  fhe  prob¬ 
lem  using  any  of  several  known  opfimizafion  fechniques.  These  fwo  fradifional  approaches  fo  routing  have 
inherenf  sfrengfhs  and  drawbacks. 

In  order  fo  provide  minimum  delays,  all  opfimal  routing  algorifhms  require  fhe  inpuf  Iraffic  and  fhe 
nefwork  topology  fo  be  sfafionary  or  very  slowly  changing  (quasi-sfafic),  and  require  a  priori  knowledge  of 
global  consfanfs  fhaf  guaranfee  convergence  of  fhe  roufing  algorifhm.  This  makes  opfimal  routing  algorifhms 
impractical  for  real  nefworks,  because  in  real  nefworks  Iraffic  is  very  bursfy  af  any  time  scale  and  fhe  nefwork 
fopology  frequenfly  experience  changes.  Moreover,  defining  global  consfanfs  fhaf  work  for  all  inpuf  fraffic 
patterns  are  impossible  to  determine. 

On  fhe  ofher  hand,  routing  algorifhms  based  on  single  shorfesf-palh  heurisfics  adapf  very  quickly  fo 
changing  nefwork  condifions,  making  fhem  far  more  preferable  fhan  optimal  roufing  for  implemenfafion  in 
real  nefworks.  The  main  shorfcoming  of  single  shorfesf-palh  roufing  is  fhaf  fhe  delays  achievable  wifh  such 
heurisfics  are  far  longer  fhan  fhose  achievable  using  optimal  routing  algorifhms.  In  addifion,  single-shorfesf- 
pafh  roufing  becomes  unslable  under  heavy  loads  or  very  bursfy  Iraffic  when  fhe  link  cosf  mefric  used  in  fhe 
roufing  algorifhm  is  relaled  to  fhe  delays  or  congesfion  experienced  over  fhe  links  [3]. 

The  facl  fhaf  shorfesf-palh  roufing  over  single  pafhs  is  far  less  efficienf  fhan  optimal  dynamic  roufing 
and  fhe  oscillatory  behavior  of  shorfesf-palh  routing  when  link  cosls  are  tied  fo  link  delays  has  been  known 
for  many  years.  However,  implemenfing  optimal  dynamic  routing  in  a  computer  nefwork  has  simply  been 
infeasible  fo  date.  The  key  conlribulions  of  Ihis  paper  consisl  of:  (a)  inlroducing  a  new  framework  for 
near-oplimum  delay  roufing;  (b)  verifying,  for  fhe  firsl  lime,  a  sel  of  invarianls  fhaf  permil  rouling-algorilhm 
designers  to  approximale  Gallager’s  necessary  and  sufficienl  conditions  for  minimum-delay  routing  wifh 
loop-free  roufing  conditions  fhaf  can  be  achieved  using  dislribuled  roufing  algorifhms  fhaf  do  nol  require 
any  global  variables  or  global  synchronizalion;  and  (c)  showing  an  example  fhaf  provides  end-to-end  delays 
fhaf  are  comparable  to  fhe  optimal,  while  being  as  fasl  as  loday’s  shorfesf-palh  routing  schemes. 

Seclion  2  presenls  fhe  minimum-delay  routing  problem  (MDRP)  as  described  by  Gallager,  and  Gal¬ 
lager’s  minimum-delay  roufing  algorifhm  [8].  Gallager’s  algorifhm  is  unsuifable  for  practical  nefworks  and 
inlernefworks,  because  ils  speed  of  convergence  fo  fhe  optimal  routes  depends  on  a  global  conslanl,  and 
because  if  requires  fhaf  fhe  inpuf  Iraffic  and  nefwork  topology  be  sfafionary  or  quasi-slalionary. 

Several  algorifhms  have  been  proposed  to  dale  fhaf  improve  over  Gallager’s  minimum-delay  roufing 
algorifhm  [2,  6,  23,  24].  Segall  and  Sidi  [23,  24]  extended  Gallager’s  minimum-delay  routing  algorifhm 
fo  handle  lopological  changes  using  techniques  developed  by  Merlin  and  Segall  [19].  Cassandras  el  al.  [6] 
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present  a  better  technique  for  measuring  marginal  delays.  Bertsekas  and  Gallager  [2]  used  second  derivatives 
to  speed  up  convergence  of  Gallager’s  algorithm.  However,  all  these  algorithms  are  still  dependent  on  global 
constants  and  the  requirement  that  network  traffic  be  static  or  quasi-static. 

Because  of  its  oscillatory  behavior  when  link  costs  are  related  to  delays,  attempts  to  improving  shortest- 
path  routing  have  been  restricted  mainly  to  using  better  link  cost  metrics  (e.g.,  [18,  13])  or  using  multiple- 
paths.  To  avoid  undetected  loops,  OSPF  permits  multiple  paths  to  a  destination  only  when  they  have  the 
same  length  [20].  More  recently,  Zaumen  and  Garcia-Luna-Aceves  [28]  proposed  an  algorithm  based  on 
distance  vectors  that  supports  multiple  paths  of  unequal  costs  to  each  destination;  however,  link  costs  are  not 
tied  to  delays.  Wang  and  Crowcroft  [27]  addressed  the  drawbacks  of  the  shortest-path  first  (SPF)  algorithm 
by  using  alternate  paths  to  detour  traffic  around  poinfs  of  congesfion  or  nefwork  failures.  However,  fhe 
alfernafe  pafhs  in  SPF-EE  (for  emergency  exifs)  are  computed  on  a  reactive  basis,  i.e.,  once  congestion 
occurs,  which  is  less  effeclive  in  dealing  wifh  short  bursfs  of  Iraffic. 

Cain  el  al.  [4]  describe  a  rouling  algorilhm  for  minimizing  delays.  However,  Ibis  algorilhm  requires 
fhaf  fhe  routing-fable  updales  al  all  fhe  roulers  be  synchronized,  olherwise  looping  occurs,  which  increases 
end-lo-end  delays.  Because  fhe  synchronization  inlervals  required  by  Ibis  algorilhm  musl  be  known  by  all 
roulers.  Ibis  is  akin  lo  using  a  global  conslanl  as  in  Gallager’s  algorilhm.  This  approach  is  nol  scalable 

10  very  large  nelworks,  because  Ihe  time  needed  for  rouling-lable  update  synchronization  becomes  large, 
and  Ibis  in  lurn  limils  ils  responsiveness  lo  shorl-lerm  Iraffic  fluclualions.  Whal  is  seriously  lacking  in  Ihis 
algorilhm  is  a  technique  for  asynchronous  compulation  of  multiple  palhs  wilh  inslanlaneous  loop-freedom. 

Section  3  presenls  a  new  framework  for  approximate  solutions  lo  MDRP  The  novelty  of  Ibis  framework 
stems  from  partitioning  Ihe  compulation  of  minimum-delay  palhs  in  Iwo  parls.  Eirsl,  multiple  loop-free  palhs 
of  unequal  cosl  lo  a  destination  are  firsl  eslablished  using  long-term  link-cosl  information.  This  is  followed 
by  Ihe  allocation  of  flows  lo  destinations  along  Ihe  multiple  loop-free  palhs  available  al  each  router;  such 
an  allocation  is  based  on  heuristics  lhal  allempl  lo  minimize  delays  using  shorl-lerm  link-cosl  information. 

11  is  Ihis  partitioning  of  MDRP  lhal  permils  us  lo  implemenl  routing  algorilhms  lhal  provide  routers  wilh 
near-oplimum  delays  while  keeping  Ihe  routing  algorilhm  as  responsive  lo  Iraffic  or  topology  changes  as  Ihe 
besl  of  today’s  shorlesl-palh  routing  algorilhms.  A  sel  of  invarianls  is  also  presented  lhal  permils  Gallager’s 
necessary  and  suflicienl  conditions  for  minimum-delay  routing  to  be  approximated  wilh  loop-free  routing 
conditions  achievable  wilh  simple  dislribuled  routing  algorilhms  lhal  do  nol  require  any  global  variables  or 
global  synchronization. 

Section  4  describes  a  specific  routing  algorilhm  based  on  our  new  routing  framework.  This  algorilhm 
consisls  of  Iwo  key  componenls:  (a)  Ihe  firsl  link-slale  routing  algorilhm  lhal  provides  multiple  loop-free 
palhs  of  arbilrary  positive  cosl  al  every  inslanl,  and  (b)  flow  allocation  heuristics  lhal  approximate  minimum 
delays  along  Ihe  predefined  multiple  loop-free  palhs  available  for  each  destination. 

Section  5  presenls  resulls  of  simulation  experimenls  designed  to  illuslrale  Ihe  effectiveness  of  our  solu¬ 
tion  in  sialic  and  dynamic  nelworks.  We  compare  our  approach  againsl  Ihe  optimal  routing  approach  and 


3 


shortest-path  routing  based  on  Dijkstra’s  shortest-path  first  (SPF)  algorithm,  because  it  is  used  widely  in  the 
Internet  today.  The  simulation  results  illustrate  that  the  routing  delays  obtained  with  our  new  algorithm  are 
comparable  to  the  optimal  delays.  Furthermore,  the  complexity  of  implementing  our  routing  framework  is 
similar  to  the  complexity  of  routing  protocols  that  provide  single-path  routing  in  the  Internet  today. 

2  Minimum  Delay  Routing 

2.1  Problem  formulation 

The  minimum-delay  routing  problem  (MDRP)  was  first  formulated  by  Gallager  [8],  and  we  provide  the 
same  description  in  this  section.  A  computer  network  G  =  (iV,  L)  is  made  up  of  N  routers  and  L  links 
between  them.  Each  link  is  bidirectional  with  possibly  different  costs  in  each  direction. 

Let  rj  >  0  be  the  expected  input  traffic,  measured  in  bits  per  second,  entering  the  network  at  router  i  and 
destined  for  router  j.  Let  be  the  sum  of  rj  and  the  traffic  arriving  from  fhe  neighbors  of  i  for  desfinafion 
j.  And  lef  routing  paramefer  be  fhe  fraction  of  fraffic  fj  fhaf  leaves  roufer  i  over  link  (f,  k).  Assuming 
fhaf  fhe  nefwork  does  nof  lose  any  packefs,  from  conservation  of  fraffic  we  have 

A  =  i  +  E  (1) 

keNi 

where  iV*  is  fhe  sef  of  neighbors  of  roufer  i. 

Lef  fik  be  fhe  expecfed  fraffic,  measured  in  bifs  per  second,  on  link  (i,  k).  Because  is  fhe  fraffic 
desfined  for  roufer  j  on  link  {i,  k)  we  have  fhe  following  equation  fo  find 


fik=Yl  Mk  (2) 

jeN 

Nofe  fhaf  0  <  /j^  <  Cik,  where  Cik  is  fhe  capacify  of  link  (f ,  k)  in  bifs  per  second. 

Property  1  For  each  router  i  and  destination  j,  the  routing  parameters  must  satisfy  the  following 
conditions: 

1.  =  0if{i,k)  ^  L  or  i  =  j.  Clearly,  if  the  link  does  not  exist,  there  can  be  no  traffic  on  it. 

•2-  (fjk  ^  0.  This  is  true,  because  there  can  be  no  negative  amount  of  traffic  allocated  on  a  link. 

'l^keNi  4'^jk  ~  is  a  consequence  of  the  fact  that  all  incoming  traffic  must  be  allocated  to 

outgoing  links. 
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Let  Dik  be  defined  as  the  expeeted  number  of  messages  or  paekets  per  seeond  transmitted  on  link  (i,  k) 
times  the  expeeted  delay  per  message  or  paeket,  ineluding  the  queueing  delays  at  the  link.  We  assume  that 
messages  are  delayed  only  by  the  links  of  the  network  and  depends  only  on  flow  fik  through  link  (?,  k) 
and  link  eharaeteristies  sueh  as  propagation  delay  and  link  eapaeity.  Dudfik)  is  a  eontinuous  and  eonvex 
funetion  that  tends  to  infinity  as  approaehes  Cik-  The  total  expeeted  delay  per  message  times  the  total 
expeeted  number  of  message  arrivals  per  seeond  is  given  by 


Dt=  ^Mk)  (3) 

(i,k)^L 

Note  that  the  router  traffie-flow  set  t  =  {t^}  and  link-flow  set  /  =  {fik}  can  be  obtained  from  r  =  {rj} 
and  (j)  =  Therefore,  Dt  can  be  expressed  as  a  funetion  of  r  and  (j)  using  Eqs.  (1)  and  (2).  The 

minimum-delay  routing  problem  ean  now  be  stated  as  follows: 


MDRP:  For  a  given  fixed  topology  and  input  traffic  flow  set  r  =  {rj},  and  delay  function  Dik{fik)  for  each 
link  {i,  k),  the  minimization  problem  consists  of  computing  the  routing  parameter  set  f  such  that 

the  total  expected  delay  Dt  is  minimized. 


2.2  A  Minimum  Delay  Routing  Algorithm 


Gallager  [8]  derived  the  neeessary  and  suffieient  eonditions  that  must  be  satisfied  to  solve  MDRP.  These 
eonditions  are  summarized  in  Gallager’s  Theorem  stated  below. 

The  partial  derivatives  of  the  total  delay,  Dt,  of  Eq.(3)  with  respeet  to  r  and  f  play  a  key  role  in  the 
formulation  and  solution  of  the  problem;  these  derivatives  are: 


DDt 

ODt 

Wk 


keN'  3 


(4) 

(5) 


where  D'-f^{fii;)  =  dDi^ifik) /dfik-  and  is  ealled  the  marginal  delay  or  incremental  delay. 

Similarly,  ODt/OF^  is  ealled  the  marginal  distance  from  router  iio  j. 

Gallager’s  Theorem  [8]:  The  necessary  condition  for  a  minimum  of  Dt  with  respect  to  ffor  all  i  j  and 
{i,  k)  E  L  is 

^  I  =  Ay  (f)],.  >  0 
df),  \  >  Ay  4>i,=0 

where  Xij  is  some  positive  number,  and  the  sufficient  condition  to  minimize  Dt  with  respect  to  <p  is  for  all 
i  j  and  {i,  k)  E  L  is 
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□ 


(7) 


D'ikUik)  + 


dPr 

dr^ 


> 


dPr 

dr] 


Eq.  (4)  shows  the  relation  between  a  router’s  marginal  distanee  to  a  particular  destination  and  the 
marginal  distances  of  its  neighbors  to  the  same  destination.  Eqs.  (5)-(7)  indicate  the  conditions  for  perfect 
load  balancing,  i.e.,  when  the  routing  parameter  set  f  gives  the  minimum  delay. 

The  set  of  neighbors  through  which  router  i  forwards  traffic  towards  j  is  denoted  by  S]  and  is  called  the 
successor  set.  ^ 

Under  perfect  load  balancing  with  respect  to  a  particular  destination,  the  marginal  distances  through 
neighbors  in  the  successor  set  are  equal  to  the  marginal  distance  of  the  router,  and  the  marginal  distances 
through  neighbors  not  in  the  successor  set  are  higher  than  the  marginal  distance  of  the  router. 

Eet  P]  denote  the  marginal  distance  from  i  to  j,  i.e.,  dPr/dr].  Eet  the  marginal  delay  P[j^{fik)  from 
itokbe  denoted  by  l],  which  is  also  called  the  cost  of  the  link  from  i  to  k. 


According  to  Gallager’s  Theorem,  the  minimum  delay  routing  problem  now  becomes  one  of  determin¬ 
ing,  at  each  router  i  for  each  destination  j:  the  routing  parameters  {fjf.},  S]  and  P],  such  that  the  following 
five  equations  are  safisfied: 


{D] 

[D] 


(8) 

keN' 

S] 

=  {klfik  >  0  A  fc  G 

(9) 

D] 

<  D]+li 

k£N* 

(10) 

+ii) 

=  (D]+ii) 

p,q  €  S] 

(11) 

+ii) 

<  {D]+ii) 

p  €S]  q^S] 

(12) 

This  reformulation  of  MDRP  is  critical,  because  if  is  fhe  firsl  step  in  allowing  us  fo  approach  fhe  problem 
by  looking  af  fhe  nexf-hops  and  disfances  obfained  af  each  roufer  for  each  destination.  Gallager  [8]  described 
a  disfribufed  routing  algorifhm  for  solving  fhe  above  five  equafions.  When  fhe  algorifhm  converges,  fhe 
aggregafe  of  fhe  successor  sefs  for  a  given  desfinafion  j  {S]  for  every  i)  define  a  direcfed  acyclic  graph. 
In  facl,  in  any  implemenfafion,  S]  must  be  loop-free  af  every  insfanf,  because  even  temporary  loops  cause 
Iraffic  fo  recirculafe  af  some  nodes  and  resulfs  in  incorrecf  marginal  delay  compufafions,  which  in  furn  can 
prevenf  fhe  algorifhm  from  converging  or  obfaining  minimum  delays. 

Gallager’s  disfribufed  algorifhm  uses  an  interesting  blocking  technique  fo  provide  loop-freedom  af  every 
insfanf  [8,  23,  24].  We  refer  fo  fhis  algorifhm  as  OPT  in  fhe  resf  of  fhe  paper.  Unforfunafely,  OPT  cannof  be 
used  in  real  nefworks  for  several  reasons.  A  major  drawback  of  OPT  is  fhaf  a  global  sfep  size  p  needs  fo  be 
^The  term  successor  set  was  first  introduced  in  [28]. 
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chosen  and  every  router  must  use  it  to  ensure  convergence.  Because  r]  depends  on  the  input  traffic  pattern, 
it  is  impossible  to  determine  one  in  practice  that  works  for  all  input  traffic  pafferns  and  for  all  possible 
fopology  modificalions.  The  roufing  parameters  are  direcfly  compufed  by  OPT  and  fhe  mulfiple  loop-free 
pafhs  are  simply  implied  by  fhe  routing  parameters  in  Eq.  (9).  The  compufafion  of  routing  parameters  is,  for 
all  practical  purposes,  a  very  slow  process  as  if  is  a  desfinalion-confrolled  process.  The  desfinafion  initiates 
every  iteration  fhaf  adjusfs  fhe  routing  paramefers  af  every  router;  furlhermore,  each  iteration  lakes  a  time 
proportional  to  the  diameter  of  the  network  and  number  of  messages  proportional  to  number  of  links.  This 
renders  the  algorithm  slow  converging  and  useful  only  when  traffic  and  topology  are  slafionary  for  limes 
long  enough  for  all  roulers  lo  adjusl  Iheir  routing  parameters  belween  changes.  Also,  depending  on  fhe 
global  conslanl  r],  fhe  desfinafion  musl  initiate  several  ilerafions  for  fhe  paramefers  to  converge  lo  Iheir  final 
values.  The  number  of  such  ilerafions  needed  for  convergence  tends  lo  be  large  for  a  small  rj,  and  small  for 
a  large  value  of  rj.  Unforlunalely,  -q  cannol  be  made  arbilrarily  large  lo  reduce  fhe  number  of  iterations  and 
lo  speed  up  convergence,  because  fhe  algorilhm  may  nol  converge  al  all  for  large  values  of  q. 

Hence,  Gallager’s  algorilhm  can  be  viewed  only  as  a  melhod  for  oblaining  lower  bounds  under  slafionary 
Iraffic,  ralher  lhan  as  an  algorilhm  lo  be  used  in  practice.  The  nexl  section  shows  how  Ihe  Iheory  inlroduced 
in  Ihe  Gallager’s  melhod  can  be  adapted  to  practical  nelworks. 

3  A  New  Framework  for  Minimum-Delay  Routing 

We  noted  lhal  in  Gallager’s  algorilhm  Ihe  compulation  of  Ihe  routing  parameter  sel  (j)  is  slow  converging  and 
works  only  in  Ihe  case  of  slafionary  or  quasi-slalionary  Iraffic.  In  Ihe  Inlernel,  Iraffic  is  hardly  slafionary  and 
perfecl  load  balancing  is  neilher  possible  nor  necessary.  Inluilively,  an  approximate  load  balancing  scheme 
based  on  some  heuristic  which  can  quickly  adapl  to  dynamic  Iraffic  should  be  sufficienl  to  minimize  delays 
subslanlially. 

The  key  idea  in  our  approach  is,  in  a  sense,  to  reverse  Ihe  way  in  which  Gallager’s  algorilhm  solves 
MDRP.  The  inluilion  behind  our  approach  is  lhal  eslablishing  palhs  from  sources  to  destinations  lakes  a 
much  longer  time  lhan  shifting  loads  from  one  sel  of  neighbors  to  anolher,  simply  because  of  Ihe  propagation 
and  processing  delays  incurred  along  Ihe  palhs.  Accordingly,  il  makes  sense  to  lirsl  eslablish  multiple  loop- 
free  palhs  using  long-term  (end-to-end)  delay  information,  and  Ihen  adjusl  routing  parameters  along  Ihe 
predefined  multiple  palhs  using  shorl-lerm  (local)  delay  information. 

This  new  approach  allows  us  to  allempl  to  use  dislribuled  algorilhms  to  compute  multiple  loop-free 
palhs  from  source  to  destination  lhal,  hopefully,  are  as  fasl  as  today’s  single-palh  routing  algorilhms,  and 
local  heuristics  lhal  can  respond  quickly  to  temporary  Iraffic  bursls  using  local  shorl-lerm  melrics  alone. 
Therefore,  we  map  Eqs.  (8)-(12)  derived  in  Gallager’s  melhod  into  Ihe  following  Ihree  equations: 

D]  =  min{D^  +  li\k  e  N^}  (13) 
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(14) 

(15) 


Sj  =  {k\D^  KDjAk^N^} 

=  'S{k,Ai,B^^)  keN^ 

where  =  {D^  +  l^p  €  iV*}  and  B)  =  {ct>)p\p  €  iV*}. 

These  equations  simply  state  that,  for  an  algorithm  to  approximate  minimum-delay  routing,  it  must 
establish  loop-free  paths  and  use  a  funetion  T'  to  alloeate  flows  over  those  paths.  We  observe  that  Eq.  (13)  is 
the  well-known  Bellman-Ford  (BE)  equation  for  eomputing  the  shortest  paths,  and  Eq.  (14)  is  the  sueeessor 
set  eonsisting  of  the  neighbors  that  are  eloser  to  the  destination  than  the  router  itself.  Note  that  the  paths 
implied  by  the  neighbors  in  the  sueeessor  set  of  a  router  need  not  be  of  the  same  length.  The  funetion  T' 
in  Eq.  (15)  is  a  heuristie  funetion  that  determines  the  routing  parameters.  Beeause  ehanging  the  routing 
parameters  effeets  the  marginal  delay  of  the  links  (henee  link-eosts),  we  use  regular  updates  of  the  link 
eosts. 

The  main  problem  with  attempting  to  solve  MDRP  using  Eqs.  (13)  to  (15)  direetly  is  that  these  equations 
assume  that  routing  information  is  eonsistent  throughout  the  network.  In  praetiee,  a  node  (router)  must 
ehoose  its  distanee  and  sueeessor  set  using  routing  information  obtained  through  its  neighbors,  and  this 
information  may  be  outdated.  At  any  time  t,  for  a  partieular  destination  j,  the  sueeessor  sets  of  all  nodes 
define  a  routing  graph  SGj{t)  =  {(m,  n)|n  €  m  €  N}.  In  single-path  routing,  Sj{t)  has  at  most 

one  neighbor:  the  neighbor  that  is  on  the  shortest  path  to  destination  j.  Aeeordingly,  SGj{t)  for  single-path 
routing  is  a  sink-tree  rooted  at  j  if  loops  are  never  ereated.  The  routing  graph  SGj{t)  in  our  ease  should  be 
a  direeted  aeyelie  graph  in  order  for  minimum  delays  to  be  approaehed. 

The  bloeking  teehnique  used  in  Gallager’s  algorithm  ensures  instantaneous  loop-freedom.  Eikewise,  to 
provide  loop-free  paths  even  when  the  network  is  in  transient  state  within  the  eontext  of  our  framework, 
additional  eonstraints  must  be  imposed  on  the  ehoiee  of  sueeessors  at  eaeh  router,  whieh  essentially  must 
preclude  the  use  of  neighbors  that  may  lead  to  looping. 

Several  algorithms  have  been  proposed  in  the  past  to  provide  loop-free  paths  at  every  instant  for  the 
case  of  single-path  routing  (e.g.,  the  Jaffe-Moss  algorithm  [15],  DUAE  [9],  EPA  [11],  and  the  Merlin- 
Segall  algorithm  [19])  and  one  algorithm,  DASM,  has  been  proposed  for  the  case  of  multiple  paths  per 
destination  [28].  All  these  algorithms  are  based  on  the  exchange  of  vectors  of  distances,  together  with 
some  form  of  coordination  among  routers  spanning  one  or  multiple  hops.  The  coordination  among  routers 
determines  when  the  routers  can  update  their  routing  tables.  This  coordination  is  in  turn  guided  by  local 
conditions  that  depend  on  values  of  reported  distances  to  destinations  and  that  are  sufficient  to  prevent  loops 
from  occurring. 

We  generalize  the  work  to  date  on  loop-free  routing  over  single  paths  or  multiple  paths  by  means  of  the 
following  loop-free  invariant  (LFI)  conditions,  which  are  applicable  to  any  type  of  routing  algorithm.  We 
adopt  the  same  terminology  and  nomenclature  first  introduced  for  DUAE  [9]  to  describe  the  EFl  conditions. 

Loop-free  Invariant  (LFI)  conditions:  Any  routing  algorithm  designed  such  that  the  following  two  equa- 
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tions  are  always  satisfied,  automatically  provides  loop-free  paths  at  every  instant,  regardless  of  the  type  of 
routing  algorithm  being  used: 


FD]  <  D%  kE  iV*  (16) 

S]  =  {  A:  I  Dik  <  FD]  ^kEN^}  (17) 

where  is  the  value  of  reported  to  i  by  its  neighbor  k;  and  FD^  is  called  the  feasible  distanee  of 
router  i  for  destination  j  and  is  an  estimate  of  Dj,  in  the  sense  that  FD^j  equals  Dj  in  steady  state  but  is 
allowed  to  differ  from  it  temporarily  during  periods  of  network  transitions. 

In  link-state  algorithms,  the  values  of  are  determined  loeally  from  the  link-state  information  sup¬ 
plied  by  the  router’s  neighbors;  in  eontrast,  in  distanee-veetor  algorithms,  the  distanees  are  direetly  eommu- 
nieated  among  neighbors.  The  following  theorem  verifies  this  key  result  of  our  framework. 

Theorem  1  If  the  LFI  conditions  are  satisfied  at  any  time  t,  the  routing  graph  SGj{t)  implied  by  the  suc¬ 
cessor  sets  Sj  (t)  is  loop-free. 

Proof:  Let  k  E  Sj{t)  then  from  Eq.  (17)  we  have 

<  FDjit)  (18) 

At  router  k,  beeause  router  Hs  a  neighbor,  from  Eq.  (16)  we  have  FD^{t)  <  Combining  this 

result  with  Eq.  (18)  we  obtain 

FD^{t)  <  FDj{t)  (19) 

Eq.  (19)  states  that,  if  A:  is  a  sueeessor  of  router  f  in  a  path  to  destination  j,  then  A:’s  feasible  distanee 
to  j  is  strietly  less  than  the  feasible  distanee  of  router  i  to  j.  Now,  if  the  sueeessor  sets  define  a  loop  af  lime 
t  wilh  respeel  lo  j,  Ihen  for  some  rouler  p  on  Ihe  loop,  we  arrive  al  FDj{t)  <  FDj{t),  an  absurd  relalion. 
Therefore,  Ihe  EEl  eondilions  are  suffieienl  for  loop-freedom.  □ 

Wilh  Ihe  resull  of  Theorem  1,  Eq.  (14)  ean  be  approximaled  wilh  Ihe  EEl  eondilions  lo  render  a  rouling 
approaeh  lhal  does  nol  require  rouling  information  lo  be  globally  eonsislenl,  al  Ihe  expense  of  rendering 
delays  lhal  may  be  longer  lhan  optimal.  Aeeordingly,  our  framework  for  near-oplimum-delay  routing  lies  in 
finding  Ihe  solution  lo  Ihe  following  equations  using  a  dislribuled  algorilhm: 


D] 

=  min{D’^^  +li\kEN^} 

(20) 

FD) 

<  D%  kE  N* 

(21) 

S) 

=  {k\D)k<  FD)  AkEN^} 

(22) 

<t>)k 

=  ^k,  {D]  +  li\p  E  TV*},  {4|p  G  A*})  fc  G  A* 

(23) 
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4  Implementing  Near-Optimum-Delay  Routing 


We  present  an  approaeh  based  on  link-state  information,  rather  than  distanee  information,  beeause  extending 
our  results  to  minimum-delay  routing  with  additional  eonstraints  ean  be  done  more  effieiently  by  working 
with  link  parameters  than  path  parameters,  whieh  are  the  eombination  of  link  parameters.  Our  approaeh 
eonsists  of  three  eomponents:  eomputing  multiple  loop-free  paths,  distributing  traffie  over  sueh  paths,  and 
eomputing  link  eosts. 

4.1  Computing  Multiple  Loop-free  Paths 

We  deseribe  the  eomputation  of  multiple  loop-free  paths  in  two  parts:  eomputing  using  a  shortest-path 
algorithm  based  on  link-state  information,  and  eomputing  Sj  by  extending  that  algorithm  to  support  multiple 
sueeessors  along  loop-free  paths  to  eaeh  destination. 

4.1.1  Computing 

There  are  many  distributed  algorithms  for  eomputing  shortest  paths,  and  any  of  them  ean  be  extended  to  pro¬ 
vide  multiple  paths  of  equal  and  unequal  eosts  as  long  as  the  extension  obeys  the  LFI  eonditions  introdueed 
in  the  previous  seetion. 

The  partial-topology  dissemination  algorithm  (PDA)  propagates  enough  link-state  information  in  the 
network,  so  that  eaeh  router  has  sufficient  link-state  information  to  eompute  shortest  paths  to  all  destinations. 
In  this  respeet,  it  is  similar  to  other  link-state  algorithms  (e.g.,  OSPF  [20],  SPTA  [25],  LVA  [10],  ALP  [12]). 
PDA  eombines  the  best  features  of  LVA,  ALP  and  SPTA.  As  in  LVA  and  ALP,  a  router  eommunieates  to 
its  neighbors  information  regarding  only  those  links  that  are  part  of  its  minimum-eost  routing  tree,  and  like 
SPTA,  a  router  validates  link  information  based  on  distanees  to  heads  of  links  and  not  on  sequenee  numbers. 

PDA  assumes  that  a  router  deteets  the  failure,  reeovery  and  link-eost  ehange  of  an  adjaeent  link  within 
a  finite  amount  of  time.  An  underlying  protoeol  ensures  that  messages  transmitted  over  an  operational  link 
are  reeeived  eorreetly  and  in  the  proper  sequenee  within  a  finite  time  and  are  proeessed  by  the  router  one  at 
a  time  in  the  order  reeeived.  These  are  the  same  assumptions  made  for  similar  routing  algorithms  and  ean 
be  easily  satisfied  in  praefiee.  Eaeh  roufer  i  running  PDA  mainfains  fhe  following  information: 

1.  The  main  topology  table,  T\  sfores  fhe  eharaeferisfies  of  eaeh  link  known  fo  router  i.  Eaeh  enfry  in 
T*  is  a  friplef  [h,  t,  d\  where  h  is  fhe  head,  t  is  fhe  fail  and  d  is  fhe  eosf  of  fhe  link  h  ^  t. 

2.  The  neighbor  topology  table,  T|,  is  assoeiafed  wifh  eaeh  neighbor  k.  The  fable  sfores  fhe  link-sfafe 
information  eommunieafed  by  fhe  neighbor  k.  Thai  is,  T^.  is  a  fime-delayed  version  of  T*. 
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procedure  INIT-PDA 

{Invoked  when  the  router  comes  up.} 

begin 

Initialize  all  tables; 
call  PDA; 
end  INIT-PDA 


procedure  PDA 

{Executed  at  each  router  i.  Invoked  when  an  event  occurs} 

begin 

(1) callNTU; 

(2)  call  MTU;  /*  Updates  T*  */ 

(3)  if  (there  are  changes  to  T*)  then 

Compose  an  LSU  message  consisting  of  topology 
differences  using  add,  delete 
and  change  link  entries; 
endif 

(4)  Within  a  finite  amount  time,  send  the 
LSU  message  to  all  neighbors; 

end  PDA 


Figure  1 :  The  Partial-topology  Dissemination  Algorithm 


3.  The  distance  table  stores  the  distanees  from  router  i  to  eaeh  destination  based  on  the  topology  in  T* 
and  the  distanees  from  eaeh  neighbor  k  to  eaeh  destination  based  on  the  topologies  in  T|  for  eaeh  k. 
The  distanee  of  router  i  to  node  j  in  T*  is  denoted  by  D} ;  the  distanee  from  k  to  j  in  is  denoted  by 


4.  The  routing  table  stores,  for  eaeh  destination  j,  the  sueeessor  set  S']  and  the  feasible  distanee  FD^, 
whieh  is  used  by  MPDA  to  enforee  LFI  eonditions. 

5.  The  link  table  stores,  for  eaeh  neighbor  k,  the  eost  of  the  adjaeent  link  to  the  neighbor. 


The  unit  of  information  exehanged  between  routers  is  a  link-state  update  (LSU)  message.  A  router  sends 
an  LSU  message  eontaining  one  or  more  entries,  with  eaeh  entry  speeifying  addition,  deletion  or  change  in 
eost  of  a  link  in  the  router’s  main  topology  table  T*.  Eaeh  entry  of  an  LSU  eonsists  of  link  information  in 
the  form  of  a  triplet  [h,  t,  d\  where  h  is  the  head,  t  is  the  tail,  and  d  is  the  eost  of  the  link  h  ^  t.  An  LSU 
message  eontains  an  aeknowledgment  (ACK)  flag  for  aeknowledging  the  reeeipt  of  an  LSU  message  from  a 
neighbor  (used  only  by  MPDA). 

The  INIT-PDA  proeedure  in  Pig.  I  initializes  the  tables  of  a  router  at  startup  time;  all  variables  of 
type  distanee  are  initialized  to  infinity  and  those  of  type  node  are  initialized  to  null.  All  sueeessor  sets  are 
initialized  to  the  empty  set.  PDA  is  exeeuted  eaeh  time  an  event  oeeurs;  an  event  ean  be  either  a  reeeipt  of  an 
LSU  message  from  a  neighbor  or  the  deteetion  of  an  adjaeent  link-eost  ehange.  Proeedure  NTU  (Neighbor 
Topology  Table  Update)  shown  in  Pig.  2  is  used  to  proeess  the  reeeived  message  and  update  the  neeessary 
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procedure  NTU 
begin 

(1)  if  (LSU  message  is  received  from  a  neighbor  k)  then 

(la)  Update  neighbor  table  T^.  That  is,  add  links, 

delete  links  or  change  links  according  to  the 
specification  of  each  entry  in  the  LSU; 

(lb)  Run  Dijkstra’s  shortest  path  algorithm 

on  the  resulting  topology  T^;  /*This  results  in 
finding  minimum  distances  from  k  to  all  other 
nodes  in  T^.  Note  is  a  tree*/ 

(lc)  Update  with  new  distances  in  T^; 
endif 

(2)  if  (adjacent  link  (i,  k)  is  up)  then 

Update  and  send  an  LSU  message  to  the 
neighbor  k  with  link  information  of  all  links  in 
its  main  topology  table  T* ; 

endif 

(3)  if  (cost  of  an  adjacent  link  (i,  k)  changed)then 

Update  II ; 

endif 

(4)  if  (adjacent  link  (i,  k)  failed)then 

Update  II  and  clear  the  table 

endif 
end  NTU 


Figure  2:  Neighbor  Topology  Table  Update  algorithm 

tables.  Proeedure  MTU  in  Fig.  3  eonstruets  the  router’s  own  shortest  path  tree  from  the  topologies  reported 
by  its  neighbors.  The  new  shortest-path  tree  obtained  is  eompared  with  the  previous  version  to  determine 
the  differenees;  only  the  differenees  are  then  reported  to  the  neighbors.  The  router  then  waits  for  the  next 
event  and,  when  it  oeeurs,  the  whole  proeess  is  repeated. 

The  algorithm  MTU  at  router  i  merges  the  topologies  and  the  adjaeent  links  ij.  to  obtain  TL  The 
merge  proeess  is  straightforward  if  all  neighbor  topologies  eontain  disjoint  sets  of  links,  but  when  two  or 
more  neighbors  report  eonflieting  information  regarding  a  partieular  link,  the  eonfliet  has  to  be  resolved. 
Sequenee  numbers  may  be  used  to  distinguish  between  old  and  new  link  information  as  in  OSPF,  but  PDA 
resolves  the  eonfliet  as  follows.  If  two  or  more  neighbors  report  information  of  link  (m,  n)  then  the  router 
i  should  update  topology  table  T*  with  link  information  reported  by  the  neighbor  that  offers  the  shortest 
distanee  from  the  router  i  to  the  head  node  m  of  the  link.  Ties  are  broken  in  favor  of  neighbor  with  the 
lowest  address.  For  adjaeent  links,  router  i  itself  is  the  head  of  the  link  and  thus  has  the  shortest  distanee. 
Therefore,  any  information  about  an  adjaeent  link  supplied  by  neighbors  will  be  overridden  by  the  most 
eurrent  information  about  the  link  available  to  router  i.  Dijkstra’s  shortest  path  algorithm  is  run  on  T*  and 
only  the  links  that  eonstitute  the  shortest-path  tree  are  retained.  Note  that,  beeause  there  are  potentially  many 
shortest-path  trees,  ties  should  be  broken  eonsistently  during  the  run  of  Dijkstra’s  algorithm. 
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procedure  MTU  at  router  i 

begin 

(1)  oldT^  <5-  T*;/*  Save  copy  */ 

(2)  if  (node  j  occurs  in  at  least  one  of  T^)  then 

add  j  to  the  main  topology  table  T* ; 

endif 

(3)  foreach  node  j  in  T*  do 

MIN  -t-  min{D^ji,  +  ll\k  G  N^}; 
let  p  be  such  that  MIN  =  {D^j^  +  /p; 

/*  Neighbor  p  is  the  preferred  neighbor  for 
destination  j.  Ties  are  broken  in  favor  of 
lower  address  neighbor  */ 
done 

(4)  foreach  j  in  T*  and  its  preferred  neighbor  p  do 

Copy  all  links  {j,  n)  from  to  T*; 

/*  i.e.,  copy  all  links  in  Tp  for  which 
j  is  the  head  node  */ 

done 

(5)  Update  T*  with  information  of  each  ; 

(6)  Run  Dijkstra’s  shortest  path  algorithm  on  T* 
and  remove  those  links  in  T*  that  are  not 
part  of  the  shortest  path  tree; 

(7)  Update  Uj  with  new  distances  in  T*; 

(8)  Compare  oldT^  with  T*  and  note  all  differences; 
end  MTU 


Figure  3:  Main  Topology  Table  Update  Algorithm 

We  have  shown  [26]  that  the  topology  tables  at  all  nodes  eonverge  to  the  shortest  paths  within  a  finite 
time  after  the  last  link  eost  ehange  in  the  network.  After  eonvergenee,  beeause  there  are  no  more  ehanges  to 
the  topology  tables,  no  more  LSU  messages  are  generated. 


4.1.2  Computing  S'] 

The  LFI  eonditions  introdueed  in  Seetion  3  suggest  a  teehnique  for  eomputing  S'*  sueh  that  the  implied 
routing  graph  SGj  is  loop-free  at  every  instant.  To  determine  FDj  in  Eq.(16),  router  i  needs  to  know 
the  distanee  from  i  to  node  j  in  the  topology  table  T/*.  Beeause  of  propagation  delays,  there  may  be 
diserepaneies  between  the  main  topology  table  T*  at  router  i  and  its  eopy  T-^  at  the  neighbor  k.  However,  at 
time  t,  the  topology  table  T/*  is  a  eopy  of  the  main  topology  table  T*  at  some  earlier  time  t'  <  t.  Logieally, 
if  a  eopy  of  27*  is  saved  eaeh  time  an  LSU  is  sent,  a  feasible  distanee  FD'j  that  satisfies  fhe  LFI  eondifions 
ean  be  found  in  fhe  hisfory  of  values  of  fhaf  have  been  saved! 

The  mulfiple-pafh  parfial-fopology  dissemination  algorifhm,  or  MPDA,  shown  in  Fig.  4  is  a  modifiea- 
fion  of  PDA  fhaf  enforees  fhe  LFI  eonditions  by  synehronizing  fhe  exehange  of  LSUs  befween  neighbors. 
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procedure  MPDA  at  router  i 
{invoked  when  an  event  occurs} 

begin 

(1) callNTU; 

(2)  if  (node  is  in  PASSIVE  state)  then 

(2a)  call  MTU;  /*  update  T*  and  D)  *1 
(2b)  FD}  -t-  min{FD},  £>]}; 

endif 

(3)  if  (node  is  in  ACTIVE  state  and  the 

last  ACK  is  received)  then 
(3a)  temp}  -t—  D}-,  Set  node  to  PASSIVE  state; 

(3b)  call  MTU  to  update  T*; 

(3c)  FD}  min{temp},  £>]} 
endif 

(4)  S)  ^  {k\Di^  <  Fi?j}; 

(5)  if  (changes  occur  in  T*)then 

Set  node  to  ACTIVE  state; 

endif 

if  (no  changes  occur  in  T*  and  the  event  is 
the  last  ACK)  then 
Set  node  to  PASSIVE  state; 

endif 

(6)  if  (there  are  changes  to  T*)  then 

Compose  a  new  LSU  for  each  neighbor  with  the  topology 
changes  expressed  as  add  link, 
delete  link  and  change  link; 

endif 

(7)  if  (input  event  received  is  an  LSU  message  from  a  neighbor)then 

Add  the  ACK  entry  to  newly  composed  LSU  of  that  neighbor 

endif 

(8)  Send  the  new  LSU  messages, 
end  MPDA 


Figure  4:  Multiple-path  Partial-topology  Dissemination  Algorithm  (MPDA) 

In  MPDA,  eaeh  LSU  message  sent  by  a  router  is  aeknowledged  by  all  its  neighbors  before  the  router  sends 
the  next  LSU.  The  inter-neighbor  synehronization  used  in  MPDA  spans  only  a  single  hop,  unlike  the  syn- 
ehronization  in  diffusing  eomputations  [7]  whieh  potentially  spans  the  whole  network.  A  router  is  said  to  be 
in  ACTIVE  state  when  it  is  waiting  for  its  neighbors  to  aeknowledge  the  LSU  message  it  sent;  otherwise,  it 
is  in  PASSIVE  state. 

Assume  that,  initially,  all  routers  are  in  PASSIVE  state  with  all  routers  having  the  eorreet  distanees  to 
all  destinations.  Then  a  series  of  link  eost  ehanges  oeeurs  in  the  network  resulting  in  some  or  all  routers 
to  go  through  a  sequenee  of  PASSIVE-to-ACTIVE  and  ACTIVE-to-PASSIVE  state  transitions,  until  all 
routers  beeome  PASSIVE  with  eorreet  distanees  to  destinations. 

If  a  router  in  a  PASSIVE  state  reeeives  an  event  that  does  not  ehange  its  topology  T*,  then  the  router  has 
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Passive-to-actrve  transitions 

Implicit  transition 

A  A  A 


Active-to-passive  transitions 


Figure  5 :  Active-passive  phase  transitions  in  MPDA. 

nothing  to  report  and  remains  in  PASSIVE  state.  However,  if  a  router  in  PASSIVE  state  receives  an  event 
that  affects  a  change  in  its  topology,  the  router  sends  those  changes  to  its  neighbors,  goes  into  ACTIVE 
state  and  waits  for  ACKs.  Events  that  occur  during  the  ACTIVE  period  are  processed  to  update  and  ij. 
but  not  T*;  the  updating  of  T*  by  MTU  is  deferred  until  the  end  of  the  ACTIVE  phase.  At  the  end  of  the 
ACTIVE  phase,  when  ACKs  from  all  neighbors  are  received,  router  i  updates  T*  with  changes  that  may 
have  occurred  in  due  to  events  received  during  the  ACTIVE  phase.  If  no  changes  occurred  in  T*  that 
need  reporting,  then  the  router  becomes  PASSIVE;  otherwise,  as  shown  in  Eig.  5,  there  are  changes  in  T* 
that  may  have  resulted  due  to  events  and  the  neighbors  need  to  be  notified.  This  results  in  a  new  ESU,  and 
the  router  immediately  becoming  ACTIVE  again.  In  this  case,  there  is  an  implicit  PASSIVE  period,  of  zero 
length  of  time,  between  two  back-to-back  ACTIVE  periods,  as  illustrated  in  Eig.  5.  A  router  i  receiving  an 
ESU  message  from  k  must  send  back  an  ESU  with  the  ACK  bit  set  after  updating  T|.  If  the  router  does  not 
have  any  updates  to  send,  either  because  it  is  in  ACTIVE  state  or  because  it  does  not  have  any  changes  to 
report,  it  sends  back  an  empty  ESU  with  just  the  ACK  flag  set.  When  a  router  detects  that  an  adjacent  link 
failed,  any  pending  ACKs  from  the  neighbor  at  the  other  end  of  the  link  are  treated  as  received.  Because  all 
ESUs  are  acknowledged  within  a  finite  time,  no  deadlocks  can  occur.  The  loop-freedom  property  of  MPDA 
are  proven  in  [26]. 

4.2  Distributing  Traffic  over  Multiple  Paths 

In  general,  the  function  T'  can  be  any  function  that  satisfies  Property  1,  but  our  objective  is  to  obtain  a 
function  T'  that  performs  load  balancing  that  is  as  close  as  possible  to  perfect  load  balancing  (Eqs.(lO)- 
(12)). 

The  function  T'  should  also  be  suitable  for  use  in  dynamic  networks,  where  the  flows  over  links  are 
continuously  changing,  causing  continuous  link-cost  changes.  To  respond  to  these  changes,  queueing  delays 
at  the  links  must  be  measured  periodically  and  routing  paths  must  be  recomputed.  However,  re-computing 
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procedure  IH 
begin 

^0; 

(2) if(|5j|  =  l)then 

^1; 

endif 

(3) if(|5j|  >  l)then 

E,-  (D‘.  ) 

^)k  (|Sj|-l) 

endif 

end  IH 


Figure  6:  Heuristic  for  initial  load  assignment. 

paths  frequently  consumes  excessive  bandwidth  and  may  also  cause  oscillations.  Therefore,  routing-path 
changes  should  only  be  done  at  sufficiently  long  intervals.  Unfortunately,  a  network  cannot  be  responsive  to 
short-term  traffic  bursfs  if  only  long-ferm  updafes  are  performed.  For  Ibis  reason,  we  use  link  cosfs  measured 
over  fwo  differenl  intervals;  link  cosfs  measured  over  shorf  intervals  of  lengfh  Tg  are  used  for  roufing- 
paramefer  compufafion  and  link  cosfs  measured  over  longer  intervals  of  lengfh  Ti  are  used  for  roufing-pafh 
compufafion  [17].  In  general,  T/  musf  be  several  limes  longer  lhan  Tg.  Long-ferm  updafes  are  designed  fo 
handle  long-term  Iraffic  changes  and  are  used  by  Ihe  rouling  protocol  to  updale  Ihe  successor  sels  al  each 
rouler,  so  lhal  Ihe  new  rouling  palhs  are  Ihe  shorfesl  palhs  under  Ihe  new  Iraffic  conditions.  The  shorf-lerm 
updafes  made  every  Tg  seconds  are  designed  to  handle  shorl-ferm  Iraffic  fluclualions  lhal  occur  belween 
long-ferm  routing  palh  updafes  and  are  used  to  compute  Ihe  routing  paramelers  in  Eq.  (15)  locally  al 
each  router.  Accordingly,  our  Iraffic  dislribulion  heurislics  assume  a  conslanl  successor  sel  and  successor 
graph. 

When  5j  is  compuled  for  Ihe  firsl  lime  or  recompuled  again  due  fo  long-term  route  changes,  Iraffic 
should  be  freshly  dislribuled.  In  Ibis  case,  Ihe  allocation  heuristic  function  T'  is  a  funclion  of  only  Ihe 
marginal  dislances  Ihrough  Ihe  successor  sel.  Thai  is,  Eq.  (15)  reduces  to  Ihe  form  =  '^{k,  {D^  + 

lp\p  €  iV*}).  When  a  new  successor  sel  S']  is  computed,  algorilhm  IH  in  Eig.  6  is  firsl  used  fo  dislribule 
iraffic  over  ihe  successor  sel  [17].  Note  lhal  {(f'ji-},  compuled  in  IH,  satisfy  Properly  1.  Eurlhermore,  when 
more  lhan  one  successor  is  presenl,  if  77]^  +  Ip  >  77]^  +  for  successors  p  and  q,  Ihen  (l)'jp  <  (^]g.  The 
heurislic  makes  sense  because  Ihe  greater  Ihe  marginal  delay  Ihrough  a  particular  neighbor  becomes,  Ihe 
smaller  Ihe  fraction  of  Iraffic  lhal  is  forwarded  fo  lhal  neighbor. 

After  Ihe  firsl  flow  assignmenf  is  made  over  a  newly  computed  successor  sel  using  algorilhm  IH,  a  differ¬ 
enl  flow  allocation  heuristic  algorilhm  AH  shown  in  Eig.  7  is  used  fo  adjusl  Ihe  routing  parameters  every  Tg 
seconds  until  Ihe  successor  sel  changes  again.  The  heurislic  function  ^  computed  in  AH  is  incremenlal  and, 
unlike  IH,  is  a  function  of  currenl  flow  allocation  on  Ihe  successor  sels  and  Ihe  marginal  dislances  Ihrough 
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procedure  AH 
begin 

(1)  ^  min{D)^  +  4  1^  e  5]}; 

//  That  is,  fco  be  the  neighbor 
that  offers  this  minimum) 

(3)  foreach  fc  G  5^  do 


done 


jk  '  k  min' 


(4)  A  \min{^\k  G  S]  A  7^  0}; 
(4)  foreach  k^k{)  hk  ^  S]  do 


^jk 


^jk 


Ax  a)^- 


done 

(5)  for  fc  =  fco  do 


done 


P)k  +  ^qeS]  ^  ^ 


end  AH 


Figure  7:  Heuristic  for  incremental  load  adjustment. 


the  successors.  AH  also  preserves  Property  1  at  every  instant.  In  AH  traffic  is  incrementally  moved  from  the 
links  with  large  marginal  delays  to  links  with  the  least  marginal  delay.  The  amount  of  traffic  moved  away 
from  a  link  is  proportional  fo  how  large  fhe  marginal  delay  of  fhe  link  is  compared  fo  fhe  besf  successor 
link.  The  heuristic  lends  fo  dislribule  Iraffic  in  such  a  way  lhal  Eqs.  (10)-(12)  hold  Irue.  This  is  imporfanl, 
because  fhe  initial  disfribulion  oblained  by  IH  is  far  from  being  balanced.  The  compulafion  complexify  of 
fhe  heurislic  allocation  algorifhms  is  0(iV*).  Because  fhe  heuristics  are  run  for  each  active  destination,  fhe 
whole  load-balancing  aclivily  is  0{N). 

Unlike  -q  in  Gallager’s  algorifhm,  T;  and  Tg  are  local  conslanls  lhal  are  sel  independenlly  al  each  router. 
Convergence  of  our  algorifhm  does  nol  crifically  depend  on  Ihese  conslanls  like  optimal  routing  does  on  q. 
Also,  Ti  and  Tg  need  nol  be  sialic  conslanls  and  can  be  made  lo  vary  according  lo  congestion  al  Ihe  router. 
The  value  of  T/,  however,  should  be  such  lhal  il  is  sufficienlly  longer  lhan  Ihe  time  il  lakes  for  computing 
Ihe  shorlesl  palhs.  The  long-term  update  periods  should  be  phased  randomly  al  each  router,  because  of  Ihe 
problems  lhal  would  resull  due  lo  synchronization  of  updates  [3]. 

4.3  Computing  Link  Costs 

As  mentioned  earlier,  Ihe  cost  of  a  link  is  the  marginal  delay  over  the  link  D'{fik). 

If  the  links  are  assumed  to  behave  like  M/M/1  queues,  then  the  marginal  delay  D'{fik)  can  be  obtained 
in  a  closed  form  expression  by  differentiating  the  following  equation  [16]. 
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I^ik  ifik) 


(24) 


fik 


{.Cik  fik) 


4”  '^ikfik 


where  /jjt  is  the  flow  through  the  link  {i,k),  and  Cik  and  Tjjt  are  the  capacity  and  propagation  delay  of  the 
link.  Because  the  M/M/1  assumption  does  not  hold  in  practice  in  the  presence  of  very  bursty  traffic,  and 
because  Eq.  (24)  becomes  unstable  when  fik  approaches  Cik,  an  on-line  estimation  of  the  marginal  delays 
is  desirable. 

There  are  several  techniques  for  computing  marginal  delays  that  are  currently  available  (e.g.,  [23, 22,  6]). 
For  the  purposes  of  simulations,  we  borrow  a  technique  introduced  by  Cassandras,  Abidi  and  Towsley  [6] 
for  on-line  estimation  of  the  marginal  delay  D'{fik).  The  technique  uses  perturbation  analysis  (PA)  for 
the  on-line  estimation  and  is  shown  to  perform  better  than  the  M/M/1  estimation.  In  addition,  the  PA 
estimation  does  not  require  a  priori  knowledge  of  the  link  capacities.  This  is  very  significant,  because  the 
capacity  available  to  best-effort  traffic  in  real  networks  varies  according  to  the  capacity  allocated  to  other 
types  of  traffic,  such  as  real-time  traffic.  We  must  emphasize  that  our  approach  does  not  depend  on  which 
specific  technique  is  used  for  marginal-delay  estimation,  although  some  methods  may  be  better  than  others. 
The  convergence  or  stability  of  our  routing  algorithm  does  not  depend  on  the  specific  technique  used  for 
marginal-delay  estimation. 


5  Simulations 

The  simulations  discussed  in  this  section  illustrate  the  effectiveness  of  our  near-optimal  framework,  and 
demonstrate  the  significant  improvements  achieved  by  our  approach  over  single-path  routing  in  static  and 
dynamic  environments.  The  delays  obtained  by  optimal  routing,  single-path  routing  and  our  approximation 
scheme  are  compared  under  identical  topological  and  traffic  environments.  The  results  show  that  the  av¬ 
erage  delays  achieved  via  our  approximation  scheme  are  comparable  (within  a  small  percentage  difference 
rather  than  several  times  difference)  to  the  optimal  routing  under  quasi-static  environment  and  the  same  are 
significantly  better  than  single-path  routing  in  a  dynamic  environment. 

For  optimal  routing,  we  implemented  the  algorithm  described  by  Gallager  [8],  and  label  it  with  ’OPT’. 
The  plots  of  our  approximation  scheme  are  labeled  with  ’MP’.  To  obtain  representative  delays  for  single-path 
routing  algorithms,  we  opted  to  restrict  our  multipath  routing  algorithm  to  use  only  the  best  successor  for 
packet  forwarding,  instead  of  simulating  any  specific  shortest-path  algorithm.  Because  of  the  instantaneous 
loop-freedom  property  that  MPDA  exhibits,  the  shortest-path  delays  obtained  this  way  are  better  than  or 
similar  to  the  delays  obtained  with  either  EIGRP  [1],  which  is  based  on  DUAF  and  requires  much  more 
internodal  synchronization  than  our  scheme,  rendering  longer  delays,  and  RIP  [14]  or  OSPF  [20],  which  do 
not  prevent  temporary  loops.  We  use  the  label  ’SP’  for  single-path  routing  in  the  graphs. 

We  performed  simulations  on  the  topologies  shown  in  Fig.  8.  CAIRN  (www.cairn.net)  is  a  real  network 
and  NETl  is  a  contrived  network.  We  are  only  interested  in  the  connectivity  of  CAIRN,  and  its  topology 
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NETl  Topology 


Figure  8:  Topologies  used  in  simulations 

as  used  differs  from  the  real  network  in  the  eapaeities  and  propagation  delays  assumed  in  the  simulation 
experiments.  We  restrieted  the  link  eapaeities  to  a  maximum  of  lOMbs,  so  that  it  beeomes  easy  to  suffieiently 
load  the  networks.  NETl  has  a  eonneetivity  that  is  high  enough  to  ensure  the  existenee  of  multiple  paths, 
and  small  enough  to  prevent  a  large  number  of  one-hop  paths.  The  diameter  of  NETl  is  four  and  the  nodes 
have  degrees  between  3  and  5.  In  eaeh  network  we  setup  flows  between  several  souree-destination  pairs  and 
measure  the  average  delays  of  eaeh  flow.  The  flows  in  CAIRN  are  setup  between  these  souree-destination 
pairs:  (Ibl,  mei-r), (nets tar,  isie),  (isi,  darpa),  (pare,  sdse),  (sri,  mit)  ,(tioe,  sdse),(mit,  sri),(isie,  netstar),  (sdse, 
pare),(mei-r,  tioe),(darpa,  isi).  For  NETl,  the  souree-destination  pairs  are:  (9,2),  (8,3),  (7,0),  (6,1),  (5,8), 
(4,1),  (3,8),  (2,9),  (1,6),  (0,7). 

The  flows  have  bandwidths  in  the  range  0.2- 1.0  Mbs.  For  simplieity,  we  used  a  stable  topology  (links  or 
nodes  do  not  fail)  in  all  the  simulations.  In  the  presenee  of  link  failures,  MP  ean  only  perform  better  than  SP, 
because  of  availability  of  alternate  paths.  Furthermore,  OPT  is  not  fast  enough  to  respond  to  drastic  topology 
changes.  Because  MP  is  parameterized  by  the  T/  and  Tg  update  intervals,  its  delay  plots  are  represented  by 
MP-TL  -xx-TS-yy,  where  xx  is  the  T;  update  interval  and  yy  is  the  Tg  update  interval  measured  in  seconds. 
Similarly,  the  delays  of  shortest-path  routing  are  represented  by  SP-TL-xx,  where  xx  is  the  T;  update  period. 

5.1  Performance  under  Stationary  Traffic 

Fig.  9  shows  the  average  delays  of  flows  in  CAIRN  for  OPT  and  MP  routing.  The  flow  IDs  are  plotted 
on  the  x-axis  and  average  delays  of  the  flows  are  plotted  on  the  y-axis.  Plot  OPT-25  represents  the  25% 
’envelope’,  that  is,  the  delays  of  OPT  are  increased  by  25%  to  obtain  the  OPT-25  plot.  As  can  be  seen,  the 
average  delays  of  flows  under  MP  routing  are  within  the  OPT-25  envelope.  Similarly,  in  Fig.  10,  the  delays 
obtained  using  MP  routing  for  NETl  are  within  28%  envelopes  of  delays  obtained  using  OPT  routing.  We 
say  delays  of  MP  are  ’comparable’  to  OPT  if  the  delays  of  MP  are  within  a  small  percent  of  those  of  OPT. 
Fig.  1 1  compares  the  average  delays  of  MP  and  SP  for  CAIRN.  We  observe  that  the  delays  of  SP  for  some 
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Figure  9:  Delays  of  OPT  and  MP  in  CAIRN. 

flows  are  two  to  four  times  those  of  MP  In  Fig.  12,  for  NETl,  MP  routing  performs  even  better;  average 
delays  of  SP  are  as  mueh  as  five  to  six  times  those  of  MP  routing  whieh  is  due  to  higher  eonneetivity  available 
in  NETl.  Also  observe  that,  beeause  of  load-balaneing  used  in  MP,  the  plots  of  MP  are  less  jagged  than 
those  of  SP.  MP  routing  performs  mueh  better  than  SP  under  high-eonneetivity  and  high-load  environments. 
When  eonneetivity  is  low  or  network  load  is  light,  MP  routing  eannot  offer  any  advantage  over  SP. 

5.2  Effect  of  Tuning  Parameters  T;  and  Tg 

The  performanee  of  MP  depends  on  the  update  intervals  T/  and  Tg.  The  setting  of  Ti  and  Tg,  however,  is 
simple.  They  are  loeal  and  ean  be  set  independently  at  eaeh  node  without  affeeting  eonvergenee,  unlike  the 
global  eonstant  r]  whieh  is  eritieal  for  eonvergenee  of  OPT.  Eor  CAIRN,  Pig.  13  show  the  effeet  of  inereasing 
T;  when  Tg  and  the  input  traffie  is  fixed.  Observe  that  when  Ti  is  inereased  from  10  to  20  seeonds,  the  delays 
in  SP  have  more  than  doubled,  while  the  delays  of  MP  remain  relatively  unehanged.  This  effeet  indieates 
that  T;  ean  be  made  longer  in  MP  without  signifieantly  effeeting  performanee.  This  is  signifieant,  beeause 
sending  frequent  update  messages  eonsume  bandwidth  and  ean  also  eause  oseillations  under  high  loads. 
Similarly,  for  NETl,  delays  for  SP  inereased  signifieantly  while  there  is  negligible  ehange  in  delays  of  MP 
as  ean  be  observed  in  Pig.  14,  respeetively.  Our  new  routing  framework  provides  the  means  for  a  trade-off 
between  update  messages  and  loeal  load-balaneing. 

At  Tg  intervals,  the  load-balaneing  heuristies  are  exeeuted,  whieh  are  strietly  loeal  eomputations  and 
require  no  eommunieation.  Therefore,  Tg  ean  be  set  aeeording  to  the  proeessing  power  available  at  the 
router.  Ti  ean  be  made  from  a  few  times  to  orders  of  magnitude  greater  than  Tg.  In  the  simplest  ease,  Tg 
ean  be  set  to  the  same  value  of  T;  and  still  gain  signifieant  performanee  as  shown  in  Pigs.  11  and  12.  In  the 
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Figure  10:  Delays  of  OPT  and  MP  in  NETl. 

figures,  we  observe  that  MP-TL-lO-TS-10  is  mueh  eloser  to  OPT  than  SP-TL-10.  Just  the  long-term  routes 
with  load-balaneing,  without  short-term  routing  parameter  updates,  seem  to  give  signitieant  gains;  the  major 
gains  here  are  due  to  the  mere  presenee  of  multiple  sueeessors  and  load-balaneing.  Our  experienee  from 
simulations  indieates  that  a  T;  that  is  only  a  few  times  of  longer  than  Tg  suftiees  to  gain  signitieant  benefits. 
This  is  great  news,  beeause  it  means  that  fine  tuning  of  T;  and  Tg  is  not  important  for  our  approaeh  to  be 
effieient. 

5.3  Performance  under  Dynamic  Traffic 

It  was  stated  earlier  that  OPT  has  very  poor  response  to  traffie  fluetuations.  This  beeomes  evident  in  Fig.  15, 
whieh  shows  a  typieal  response  in  NETl  when  the  flow  rate  is  a  step  funetion  (i.e..,  the  flow  rate  is  inereased 
from  0  to  a  finite  amount  at  time  0).  The  dampened  response  of  the  network  using  MP  indieates  the  fast 
responsiveness  of  MP,  making  it  suitable  for  dynamie  environments.  Beeause  OPT  eannot  respond  fast 
enough  to  traffie  fluetuations,  it  is  impossible  to  find  the  optimal  delays  for  dynamie  traffie.  However, 
we  ean  find  a  reasonable  lower  bound  if  the  input  traffie  pattern  is  predietable  like  the  pattern  shown  in 
Pig  16,  whieh  shows  only  one  eyele  of  the  input  pattern.  To  obtain  a  lower  bound  for  this  traffie  pattern 
that  represents  ’ideal’  OPT  (the  one  that  has  instantaneous  response)  we  first  obtain  the  lower  bound  for 
eaeh  interval  during  whieh  traffie  is  steady  by  running  a  separate  off-line  simulation  with  traffie  rate  that 
eorresponds  to  that  interval,  and  eombine  the  results  to  obtain  the  lower  bound.  It  is  with  this  lower  bound 
that  we  eompare  delays  of  MP.  Pig.  17  shows  the  average  delays  of  the  flows  for  OPT,  MP  and  SP  routing. 
The  results  indieate  that  delays  of  MP  routing  are  again  in  the  eomparable  range  of  delays  of  an  ’ideal’ 
optimal-routing  algorithm. 
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Comparison  of  MP  and  SP  delays 


Figure  11:  Delays  of  MP  and  SP  in  CAIRN. 

Ultimately,  MP  will  be  used  in  real  networks  where  traffie  is  bursty  at  any  time-seale;  therefore,  it  is 
important  to  see  how  MP  performs  in  that  environment.  We  extraeted  10  flows  from  the  Internet  traffie 
traees  obtained  from  LBL  [21]  and  used  them  as  input  for  the  10  flows  in  the  CAIRN.  Fig.  18  shows  the 
delays  for  SP  and  MP.  We  do  not  perform  this  simulation  with  OPT  beeause  Internet  traffie  is  too  bursty  for 
OPT  to  eonverge.  Observe  that,  exeept  for  flows  4,  6  and  8,  delays  of  MP  are  mueh  better  than  those  of  SP. 
The  reason  SP  delays  of  these  flows  are  better  than  those  of  MP  is  beeause  of  uneven  distribution  of  load 
in  the  network  and  low  loads  in  some  seetions  of  the  network  —  in  low-load  environments  SP  ean  perform 
slightly  better  than  MP.  This  ean  be  easily  reetified  by  modifying  IH  to  use  a  small  threshold  eost  for  the 
best  link,  the  erossing  of  whieh  aetually  triggers  the  load-balaneing  seheme. 

6  Conclusions 

We  have  presented  a  praetieal  approaeh  to  near-optimal  delay  routing  in  eomputer  networks.  To  over- 
eome  the  limitations  of  optimal  routing  algorithms,  we  proposed  an  approximation  seheme  and  suggested 
algorithms  that  implement  various  eomponents  of  the  approximation.  The  resulting  framework  is  both  im- 
plementable  in  real  networks  and  also  provides  delays  that  are  elose  to  those  obtainable  using  the  Gallager’s 
method.  An  important  element  of  our  framework  is  our  generalization  of  suffieient  eonditions  for  loop-free 
routing,  whieh  are  applieable  to  any  type  of  routing  algorithm. 

We  presented  one  of  many  possible  implementations  of  the  new  routing  framework.  In  doing  so,  we 
introdueed  the  first  link-state  routing  algorithm  that  provides  multiple  paths  that  are  loop-free  at  every  instant 
and  that  need  not  be  of  equal  eost.  We  have  shown  through  simulations  that  our  implementation  of  the 
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Figure  12:  Delays  of  MP  and  SP  in  NETl. 
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Figure  16:  Variable  input  traffie  pattern 
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Comparison  of  delays  under  variable  traffic 


Comparison  of  MP  and  SP  delays 


Figure  18:  Delays  under  Internet  traffie  in  CAIRN. 

proposed  framework  performs  signifieantly  better  than  single-path  routing,  and  that  it  offers  delays  that 
are  within  a  small  pereentage  of  the  lower  bound  delays  under  stationary  traffie.  The  simulations  are  by 
no  means  exhaustive,  but  the  results  elearly  indieate  that  the  framework  does  offer  potential  for  obtaining 
delays  that  eompare  with  the  optimal  routing. 

Additional  work  is  needed  to  study  flow  alloeafion  heurisfies  fhaf  are  better  suited  for  speeifie  end-fo- 
end  serviees,  e.g.,  frying  fo  avoid  ouf-of  order  paekefs  for  eerfain  flows.  Furfhermore,  our  new  roufing 
framework  opens  up  many  inferesfed  researeh  opporfunifies  for  qualily-of-serviee  (QoS)  roufing,  beeause 
fhe  loop-free  invarianf  eondifions  on  whieh  if  is  based  ean  be  further  eonsfrained  fo  salisfy  differenl  fypes 
of  serviee.  Similarly,  beeause  fhe  fraffie  alloeafion  heurisfies  depend  on  loeal  rafher  fhan  global  parameters 
and,  new  heurisfies  ean  be  defined  fo  aeeounf  for  QoS  eonsfrainfs. 
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